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ORIGINS  OF  THE  SIMPLEX  METHOD  Dlst  Special 
by  George  B.  Dantzig  I 

Stanford  University  A^l  I 


Abstract 


In  the  summer  of  1947,  when  I  fiist  began  to  work  on  the  simplex  method  for 
solving  linear  programs,  the  first  idea  that  occurred  to  me  is  one  that  would 
occur  to  any  trained  mathematician,  namely  the  idea  of  step  by  step  descent 
(with  respect  to  the  objective  function)  along  edges  of  the  convex  polyhedral 
set  from  one  vertex  to  an  adjacent  one.  I  rejected  this  algorithm  outright 
on  intuitive  grounds  —  it  had  to  be  inefficient  because  it  proposed  to  solve 
the  problem  by  wandering  along  some  path  of  outside  edges  until  the  optimal 
vertex  was  reached.  I  therefore  began  to  look  for  other  methods  which  gave 
more  promise  of  being  efficient,  such  as  those  that  went  directly  through  the 
interior,  [l]. 

Today  we  know  that  before  1947  that  four  isolated  papers  had  been  published 
on  special  cases  of  the  linear  programming  problem  by  Fourier  (1824)  [5],  de  la 
Vall4e  Poussin  (1911)  [6],  Kantorovich  (1939)  [7]  and  Hitchcock  (1941)  [8].  All 
except  Kantorovich’s  paper  proposed  as  a  solution  method  descent  along  the 
outside  edges  of  the  polyhedral  set  which  is  the  way  we  describe  the  simplex 
method  today.  There  is  no  evidence  that  these  papers  had  any  influence  on  each 
other.  Evidently  they  sparked  zero  interest  on  the  part  of  other  mathematicians 
and  were  unknown  to  me  when  I  first  proposed  the  simplex  method.  As  we 
shall  see  the  simplex  algorithm  evolved  from  a  very  different  geometry,  one  in 
which  it  appeared  to  be  very  efficient. 


The  linear  programming  problem  is  to  find 


min  2,  x  >  0  such  that  Ax  =  b,  cx  =  «(min), 


where  x  =  (xi, . . .  ,  xn),  A  is  an  m  by  n  matrix,  and  b  and  c  are  column  and  row  vectors. 


Curiously  enough  up  to  1947  when  I  first  proposed  that  a  model  based  on  linear 
inequalities  be  used  for  planning  activities  of  large-scale  enterprises,  linear  inequality  theory 
had  produced  only  forty  or  so  papers  in  contrast  to  linear  equation  theory  and  the  related 
subjects  of  linear  algebra  and  approximation  which  had  produced  a  vast  literature,  [4]. 
Perhaps  this  disproportionate  interest  in  linear  equation  theory  was  motivated  more  than 
mathematicians  care  to  admit  by  its  practical  use  as  an  important  tool  in  engineering  and 
physicis,  and  by  the  belief  that  linear  inequality  systems  would  not  be  practical  to  solve 
unless  they  had  three  or  less  variables,  [5]. 

My  proposal  served  as  a  kind  of  trigger  —  ideas  that  had  been  brewing  all  through 
World  War  II  but  had  never  found  expression  burst  forth  like  an  explosion.  Almost  two 
years  to  the  day  that  I  first  proposed  that  L.P.  be  used  for  planning,  Koopmans  organized 
the  1949  conference  (now  referred  to  as  The  Zero-th  Symposium  on  Mathematical  Pro¬ 
gramming)  at  the  University  of  Chicago.  There  mathematicians,  economists,  and  statis¬ 
ticians  presented  their  research  and  produced  a  remarkable  proceedings  entitled  Activity 
Analysis  of  Production  and  Allocation,  [2].  L.P.  soon  became  part  of  the  newly  developing 
professional  fields  of  Operations  Research  and  Management  Science.  Today  thousands  of 
linear  programs  are  solved  daily  throughout  the  world  to  schedule  industry.  These  involve 
many  hundreds,  thousands  and  sometimes  tens  of  thousands  of  equations  and  variables. 
Some  mathematicians  rank  L.P.  as  “the  newest  yet  most  potent  of  mathematical  tools” 
[16]. 

John  von  Neumann,  Tjalling  C.  Koopmans,  Albert  W.  Tucker,  and  others  well  known 
today,  some  just  starting  their  careers  back  in  late  1940’s,  played  important  roles  in  L.P.’s 
early  development.  A  group  of  young  economists  associated  with  Koopmans  (R.  Dorfman, 
K.  Arrow,  P.  Samuelson,  H.  Simon  and  others)  became  active  contributors  to  the  field. 
Their  research  on  L.P.  had  a  profound  effect  on  economic  theory  leading  to  Nobel  Prizes. 
Another  group  led  by  A. W. Tucker,  notably  D.  Gale  and  H.  Kuhn,  began  the  development 
of  the  mathematical  theory. 

This  outpouring  between  the  years  of  1947-1950  coincided  with  the  first  building  of 
digital  computers.  The  computer  became  the  tool  that  made  the  application  of  linear 
programming  possible.  Everywhere  we  looked,  we  found  practical  applications  that  no 
one  earlier  could  have  posed  seriously  as  optimization  problems  because  solving  them  by 
hand  computation  would  have  been  out  of  the  question.  By  good  luck,  clever  algorithms 
in  conjunction  with  computer  development  gave  early  promise  that  linear  programming 
would  become  a  practical  science.  The  intense  interest  by  the  Defense  Department  in  the 
linear  programming  application  also  had  an  important  impact  on  the  early  construction 


of  computers  [17].  The  U.S.  National  Bureau  of  Standards  with  Pentagon  funding  became 
a  focal  point  for  computer  development  under  Sam  Alexander;  its  Mathematics  Group 
under  John  Curtis  began  the  first  experiments  on  techniques  for  solving  linear  programs 
primarily  by  Alan  Hoffman,  Theodore  Motzkin,  and  others  [3]. 

Since  everywhere  we  looked,  we  could  see  possible  applications  of  linear  programs,  it 
seemed  only  natural  to  suppose  that  there  was  extensive  literature  on  the  subject.  To  my 
surprise,  I  found  in  my  search  of  the  contemporary  literature  of  1947  only  a  few  references 
on  linear  inequality  systems  and  none  on  solving  an  optimization  problem  subject  to  linear 
inequality  constraints. 

T.S.  Motzkin  in  his  definitive  1936  Ph.D.  thesis  on  linear  inequalities  [4]  makes  no 
mention  of  optimizing  a  function  subject  to  a  system  of  linear  inequalities.  However,  15 
years  later  at  the  First  Symposium  on  Linear  Programming  (June  1951),  Motzkin  declared: 
“there  have  been  numerous  rediscoveries  [of  LP]  partly  because  of  the  confusingly  many 
different  geometric  interpretations  which  these  problems  admit”.  He  went  on  to  say  that 
different  geometric  interpretations  allows  one  “to  better  understand  and  sometimes  to 
better  solve  cases  of  these  problems  as  they  appeared  and  developed  from  a  first  occurrence 
in  Newton’s  Methodus  Fluxionim  to  right  now”. 

The  “numerous  rediscoveries”  that  Motzkin  referred  to  probably  were  to  two  or  three 
papers  we  have  already  cited  concerned  with  finding  the  least  sum  of  absolute  deviations,  or 
minimizing  the  maximum  deviation  of  linear  systems,  or  determining  whether  there  exists 
a  solution  to  a  system  of  linear  inequalities.  Fourier  pointed  out  as  early  as  1824  these  were 
all  equivalent  problems,  [5] .  Linear  Programs,  however,  had  also  appeared  in  other  guises. 
In  1928,  von  Neumann  [19]  formulated  the  zero-sum  matrix  game  and  proved  the  famous 
Mini-Max  Theorem,  a  forerunner  of  the  famous  Duality  Theorem  of  Linear  Programming 
(also  due  to  him)  [11].  In  1936,  Neyman-Pearson  considered  the  problem  of  finding  an 
optimal  critical  region  for  testing  a  statistical  hypothesis.  Their  famous  Neyman-Pearson 
Lemma  is  a  statement  about  the  Lagrange  Multipliers  associat  *d  with  an  optimal  solution 
to  a  linear  program,  [20]. 

After  I  had  searched  the  the  contemporary  literature  of  1947  and  found  nothing,  I 
made  a  special  trip  to  Chicago  in  June  1947  to  visit  T.J.  Koopmans  to  see  what  economists 
knew  about  the  problem.  As  a  result  of  that  meeting,  Leonid  Hurwicz,  a  young  colleague 
of  Koopmans,  visited  me  in  the  Pentagon  in  the  summer  and  collaborated  with  me  on 
my  early  work  on  the  simplex  algorithm,  a  method  which  we  described  at  the  time  as 
“climbing  up  the  bean  pole”  —  we  were  maximizing  the  objective. 


Later  I  made  another  special  trip,  this  one  to  Princeton  in  the  fall  of  1947,  to  visit  the 
great  mathematician  Johnny  von  Neumann  to  learn  what  mathematicians  knew  about  the 
subject.  This  was  after  I  had  already  proposed  the  simplex  method  but  before  I  realized 
how  very  efficient  it  was  going  to  be,  [1]. 

The  origins  of  the  simplex  method  go  back  to  one  of  two  famous  unsolved  problems 
in  mathematical  statistics  proposed  by  Jerzy  Neyman  which  I  mistakenly  solved  as  a 
homework  problem;  it  later  became  part  of  my  Ph.D.  thesis  at  Berkeley,  [9],  Today  we 
would  describe  this  problem  as  proving  the  existence  of  optimal  Lagrange  multipliers  for  a 
semi-infinite  linear  program  with  bounded  variables.  Given  a  sample  space  fl  whose  sample 
points  u  have  a  known  probability  distribution  dP(u)  in  fl,  the  problem  I  considered  was  to 
prove  the  existence  of  a  critical  region  u  in  fl  that  satisfied  the  conditions  of  the  Neyman- 
Pearson  Lemma.  More  precisely,  the  problem  concerned  finding  a  region  w  in  fl  that 
minimized  the  Lebesgue-Stieltjes  integral  defined  by  (4)  below,  subject  to  (2)  and  (3): 


f  dP(u)  =  a, 

J  u / 

a-1  f  f(u)dP(u)  =  b, 

J  uj 

a-1  f  g(u)dP(u)  =  a(min), 
J  u> 


where  0  <  a  <  1  is  the  specified  “size”  of  the  region;  /(u)  is  a  given  vector  function  of  u 
with  m  —  1  components  whose  expected  value  over  u  is  specified  by  the  vector  b;  and  g(u) 
is  a  given  scalar  function  of  u  whose  unknown  expected  value  z  over  w  is  to  be  minimized. 

Instead  of  finding  a  critical  region,  we  can  try  to  find  the  characteristic  function  <j>{u) 
with  the  property  that  <£(u)  =  1  if  u  G  u  and  <£(u)  =  0  if  u  ^  u;.  The  original  problem  can 
then  be  restated  as: 

Find  min  z  and  a  function  <f>(u)  for  u  6  fl  such  that: 

j  <f>(u)dP(u)  =  a,  0  <  <f>(u)  <  1,  (5) 

Juen 

a-1/  <j>(u)f(u)dP(u)  =  b,  (6) 

•/«€  n 

a-1/  4>{u)g(u)dP[u)  —  z(min).  (7) 

J  uen 

A  discrete  analog  of  this  semi-infinite  linear  program  can  be  obtained  by  selecting 
n  representative  sample  points  u1, . . . ,  u} , . . .  ,un  in  fl  and  replacing  dP{u3)  by  discrete 


point  probabilities  A y  >  0  where  n  may  be  finite  or  infinite.  Setting 


Xj  =  (Ay/a)  •  <£(uJ),  0  <  Xy  <  Ay/a,  (8) 

the  approximation  problem  becomes  the  bounded  variable  LP: 

Find  min  z,  0  <  zy  <  Ay/ay: 


n 


n  x>  = 1 

l 

(9) 

n 

Y,  A  iXi  =  b 

1 

(10) 

n 

Y  CiXi  =  z(min) 

(11) 

1 


where  f(u3)  =  A.j  are  m  —  1  component  column  vectors,  and  g(u3)  =  cy. 

Since  n  the  number  of  descrete  j  could  be  infinite,  I  found  it  more  convenient  to 
analyze  the  L.P.  problem  in  the  geometry  of  the  finite  (m-fl)  dimensional  space  associated 
with  the  coefficients  in  a  column.  I  did  so  initially  with  the  convexity  constraint  (9) 
but  with  no  explicit  upper  bound  on  the  non-negative  variables  xy,  [10],  [2],  [ll].  Since 
the  first  coefficient  in  a  column  (the  one  corresponding  to  (9))  is  always  1,  my  analysis 
omitted  the  initial  1  coordinate.  Each  column  (Ay,  Cy)  becomes  a  point  (y,  z)  in  Rm  where 
y  =  (yi, . . . ,  ym-i)  has  m  —  1  coordinates. 

The  problem  can  now  be  interpreted  geometrically  as  one  of  assigning  weights  xy  >  0 
to  the  n  points  (y3,z3)  =  (A.y,Cy)  in  Rm  so  that  the  “center  of  gravity”  of  these  points, 
see  Figure  1,  lies  on  the  vertical  “requirement”  line  (b,z)  and  such  that  its  z  coordinate  is 
as  small  as  possible. 

Simplex  Algorithm 

Step  t  of  the  algorithm  begins  with  an  m  —  1  simplex,  see  Figure  1,  defined  by  some 
m  points  (Ay,,cy.)  for  t  =  (1, . . .  ,m)  and  m  weights  x°.  >  0  (in  the  non-degenerate  case) 
such  that  Yl  A  Jt xy.  =  6.  In  the  figure,  the  vertices  of  the  m  —  1  =  2  dimensional  simplex 
correspond  to  ji  =  1,  j'2  =  2,  j$  =  3.  The  line  (6,  z)  intersects  the  plane  of  the  simplex 
(the  triangle  in  the  figure)  in  an  interior  point  ( b,zt ).  A  point  (Aj,es)  is  then  determined 
whose  vertical  distance  below  this  “solution”  plane  of  the  simplex  is  maximum. 


X 


(A  c  )  f  y  x  / 

t  A  ^  \  ^  I  /l_  _  v 


Solution  Plane 
of  (m-1)  Simplex 


<b’zt*l>  >< 


Requirement  Line 


(b,0) 


Figure  1.  The  m  Dimensional  Simplex 

Algebraically  the  equation  z  =  ny  +  7To  of  the  plane  associated  with  the  simplex,  is 
found  by  solving  the  system  m  equations  7r Ay.  +  n0  =  Cy.,  y,  =  Next,  let 

j  —  s  be  the  index  of  (A.,,c,)  the  point  most  below  this  plane,  namely 

s  =  argmin[cy  -  (7 rA.;  +  7T0)].  (12) 

J 

If  [c,  -  {it A  ,  +tt0)]  turns  out  to  be  non-negative,  the  iterative  process  stops,  otherwise 
the  m  simplex,  the  tetrahedron  in  Figure  1,  is  formed  as  the  convex  combination  of  the 
point  A.„c,)  and  the  (m— 1)  simplex.  The  requirement  line  (b,  z)  intersects  this  m-simplex 
in  a  segment  (6,  2<+i),  (6,  zt)  where  zt+i  <  zt.  The  face  containing  (6,  zt+i)  is  then  selected 
as  the  new  (m  -  l)-simplex.  Operationally  the  point  (A,„c()  replaces  A.yr,cyr  for  some  r. 
The  index  r  is  not  difficult  to  determine  algebraically. 

Geometrical  insight  as  to  why  the  simplex  method  is  efficient  can  be  gained  by  viewing 
the  algorithm  in  two  dimensions,  see  Figure  2.  Suppose  a  piecewise  linear  function  y  =  f(x) 
is  defined  by  the  underbelly  of  the  convex  hull  of  the  points  ( y3,z 3)  =  (Ay,  cy) .  We  wish 


,%■  ,  ■  _*•  ."N  "w  _‘s"  "*  „%  “• 


rtrmw 


/  V  “>  v  V  V  !"> 


to  solve  the  equation  y  =  f(b)  and  to  find  two  points  (y3,  z3),  (y  ,  zk)  and  the  weights 
(A,/i)  >  0  on  these  two  points  such  that  Ay 3  +  fj.yk  =  6,  A  +  M  =  1,  A z3  +  /. izk  =  f(b).  In 
the  two  dimensional  case,  the  simplex  method  resembles  a  kind  of  secant  method  in  which, 
given  any  slope  a,  it  is  cheap  to  find  a  point  (y*,z*)  of  the  underbelly  such  that  the  slope 
(actually  the  slope  of  a  support)  at  y*  is  a,  but  where  it  is  not  possible  given  6  to  directly 
compute  y  =  /(6). 


Figure  2.  The  Under-belly  of  the  Convex  Hull 


In  Figure  2,  the  algorithm  is  initiated  (in  Phase  II  of  the  simplex  method)  by  two 
points,  say  (y1^1)  and  (y6,z6),  on  opposite  sides  of  the  requirement  line.  The  slope  of  the 
“solution”  line  joining  them  is  <7i.  Next,  one  determines  that  the  point  (y5,26)  is  the  one 
most  below  the  line  joining  ( yl,zl )  to  (y6,26)  with  slope  o^.  This  is  done  algebraically  by 
simply  substituting  the  coordinates  ( yJ,z 3)  into  the  equation  of  the  solution  line  z  -  zG  = 
<7i(y  -  y6)  and  finding  the  point  j  =  s  such  that  al(y3  -  y6)  —  ( z3  -  zG)  is  maximum.  For 
the  example  above,  s  =  5  and  thus  (y5,25)  replaces  (y6,z6).  The  steps  are  then  repeated 
with  {yl,zl)  and  (y5,z5).  The  algorithm  finds  the  optimum  point  (6,z*)  in  two  iterations 
with  the  pair  (y3,  z3),  (y5,  z5). 


In  practical  applications,  one  would  expect  that  most  of  the  points  [A  y,  cy)  would 
lie  above  the  underbelly  of  their  convex  hull.  We  would  therefore  expect  that  very  few 
j  would  be  extreme  points  of  the  underbelly.  Since  the  algorithm  only  chooses  (A.„,ct) 
from  among  the  latter  and  these  typically  would  be  rare,  I  conjectured  that  the  algorithm 
would  have  very  few  choices  and  would  take  about  m  steps  in  practice. 

It  is,  of  course,  not  difficult  to  construct  cases  that  take  more  than  m  iterations  so  let 
me  make  some  remarks  about  the  rate  of  convergence  of  zt  to  2*,  the  minimum  value  of 
2,  in  the  event  that  the  method  takes  more  than  m  interations. 

Convergence  Rate  of  the  Simplex  Method 

Assume  there  exists  a  constant  6  >  0  such  that  for  every  iteration  r,  the  values  of  all 
basic  variables  xTJt  satisfy 

>  0  >  0  for  all  ji,  (13) 

At  the  start  of  iteration  t,  by  eliminating  the  basic  variables  from  the  objective  equation, 
we  obtain 

Zt-i  -  z  =  J2(-ctj)xj  (14) 

where  =  0  for  all  basic  j  =  j\.  If  (— d^)  =  max(-Cy)  <  0,  the  iterative  process  stops 
with  the  current  basic  feasible  solution  optimal.  Otherwise,  we  increase  non-basic  x„  to 
x,  —  0t  >  9  and  adjust  basic  variables  to  obtain  the  basic  feasible  solution  to  start  iteration 
t  +  1. 

Let  z*  ~  min  2  and  xy  =  x*  >  0  be  the  corresponding  optimal  x;  .  We  define  At  = 
zt  —  2*. 

Theorem.  Independent  of  n  the  number  of  va.ria.bles, 

(Ae/Ao)  <  (1  -  *i)(l  -  02)  •  •  •  (1  -  0t)  <  e~T'6r  <  e~S  l.  (15) 

where  6t  >  6  >  0  is  the  value  of  the  incoming  basic  variable  x,  on  iteration  t. 

Proof. 

At-  1  =  2t-I  -2*  =  5I(-cj)xy  <  (-ci)J^Xy  =  (-cl).  (16) 

At-i  -  A t  =  zt-i  -  zt  =  (-c^)x,  =  {-?t)0t  >  Aj_  1  •  et  ,  (17) 

where  the  inequality  between  the  last  two  terms  is  obtained  by  multiplying  (16)  by  0t. 
Rearranging  terms, 

At  <  (1  -  tftjA,-!  <  e-^'Afi  <  e~°A  t_x  (18) 

and  (15)  follows.  | 


’  W>  &  ,J*V 


Corollary.  Assuming  0r  has  “on  the  average”  the  same  average  value  as  any  other  x^, 
namely  (1/m),  then  the  expected  number  of  Iterations  t  required  to  affect  an  e~"k  fold 
decrease  in  AQ  will  be  less  than  km  iterations,  i.e. 

(Ai/Ao)  <  e~T‘°r  =  e-^m  .  (19) 

Thus,  under  the  assumption  that  the  value  of  the  incoming  variable  is  1/m  on  the 
average,  a  thousand-fold  decrease  in  A t  —  zt  —  z*  could  be  expected  to  be  obtained  in  less 
that  7m  iterations  because  e~7  <  .001. 

It  was  considerations  such  as  these  that  led  me  back  in  1947  to  believe  that  the  simplex 
method  would  be  very  efficient. 

It  is  fortunate  back  in  1947  when  algorithms  for  solving  linear  programming  were  first 
being  developed,  that  the  column  geometry  and  not  the  row  geometry  was  used.  As  we 
have  seen,  the  column  geometry  suggested  a  very  different  algorithm,  one  that  promised  to 
be  very  efficient.  Accordingly,  I  developed  a  variant  of  the  algorithm  without  the  convexity 
constraint  (9)  and  arranged  in  the  fall  of  1947  to  have  the  Bureau  of  Standards  test  it  on 
George  Stiegler’s  nutrition  problem  [14].  Of  course,  I  soon  observed  that  what  appeared 
in  the  column  geometry  to  be  a  new  algorithm  was,  in  the  row  geometry,  the  vertex 
descending  algorithm  that  I  had  rejected  earlier. 

It  is  my  opinion  that  any  well  trained  mathematician  viewing  the  linear  programming 
problem  in  the  row  geometry  of  the  variables  would  have  immediately  come  up  with  the 
idea  of  solving  it  by  a  vertex  descending  algorithm  as  did  Fourier,  de  la  Vall4e  Poussin, 
and  Hitchcock  before  me  —  each  of  us  proposing  it  independently  of  the  other.  I  believe, 
however,  that  if  anyone  had  to  consider  it  as  a  practical  method,  as  I  had  to,  he  would 
have  quickly  rejected  it  on  intuitive  grounds  as  a  very  stupid  idea  without  merit.  My 
own  contributions  towards  the  discovery  of  the  simplex  method  were  (l)  independently 
proposing  the  algorithm,  (2)  initiating  the  development  of  the  software  necessary  for  its 
practical  use,  and  (3)  observing  by  viewing  the  problem  in  the  geometry  of  the  columns 
rather  than  the  rows  that,  contrary  to  geometric  intuition,  following  a  path  on  the  outside 
of  the  convex  polyhedron,  might  be  a  very  efficient  procedure. 

The  Role  of  Sparsity  in  the  Simplex  Method 

To  determine  s  =  argminy[cy-  (7rA.y  +  7To)]  requires  forming  the  scalar  product  of  two 
vectors  n  and  A.j  for  each  j.  This  “pricing  out”  operations  it  is  called  is  usually  very 
cheap  because  the  vectors  A,3 ■  are  sparse,  i.e.,  they  typically  have  few  non-zero  coefficients 
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(perhaps  on  the  average  4  or  5  non-zeros).  Nevertheless  if  the  number  of  columns  n  is 
large,  say  several  thousand,  pricing  can  use  up  a  lot  of  CPU  time.  (Parallel  processors 
could  be  used  very  effectively  for  pricing  by  assigning  subsets  of  the  columns  to  different 
processors,  [18].) 

In  single  processors,  various  partial  pricing  schemes  are  used.  One  scheme  used  in 
MINOS  software  system  is  to  partition  the  columns  into  subsets  of  some  k  columns  each, 
[12].  The  choice  of  s  is  restricted  to  columns  that  price  out  negative  among  the  first  k 
until  there  are  none  and  then  moving  on  to  the  next  k,  etc.  Another  scheme  used  is  to 
price  out  all  the  columns  and  rank  them  as  to  how  negative  they  price  out.  A  subset  of  j, 
say  the  fifty  most  negative  in  rank,  are  then  used  to  iteratively  select  s  until  this  subset  no 
longer  has  a  column  that  prices  out  negative.  Then  a  new  subset  is  generated  for  selecting 
s  and  the  process  is  repeated.  The  use  of  partial  pricing  schemes  are  very  effective  when  n 
is  large  especially  for  matrix  structures  that  contain  so  called  “GUB”  (Generalized  Upper 
Bound)  rows,  [13]. 


Besides  the  pricing-out  of  the  columns,  the  simplex  method  requires  that  the  current 
basis  B,  i.e.  the  columns  (j., . . .  ,jm)  used  to  form  the  simplex  in  Figure  1  be  maintained 
from  iteration  t  to  t  + 1  in  a  form  that  makes  it  easy  to  compute  two  vectors  v  and  n  where 
Bv  =  A,  and  i xB  =  (c}l>. . .  ,c3m).  The  matrix  B  is  typically  very  sparse.  In  problems 
where  the  number  of  rows  m  >  1,000,  the  percent  of  non-zeros  may  be  less  than  |  of  one 
percent.  Even,  for  such  B,  it  is  not  practical  to  maintain  B~l  explicitly  because  it  could 
turn  out  to  be  100%  dense.  Instead  B  is  often  represented  as  the  product  of  a  lower  and 
upper  triangular  matrix  where  each  is  maintained  as  a  product  of  elementary  matrices 
with  every  effort  being  made  to  to  keep  the  single  non-unit  column  of  these  elementary 
matrices  as  sparse  as  possible.  Maintaining  this  sparsity  is  important  otherwise  for  the 
case  of  m  =  1,000  the  algorithm  would  have  to  manipulate  data  sets  with  millions  of 
non-zero  numbers.  Solving  systems  Bv  —  A.t  in  order  to  detrmine  which  variable  leaves 
the  basis  would  become  too  costly. 


The  Role  of  Near  Triangularity  of  the  Basis 


The  success  of  the  simplex  method  in  solving  very  large  problems  encountered  in 
practice  depends  on  two  properties  found  in  almost  every  practical  problem.  First,  the 
basis  is  usually  very  sparse.  Second,  one  can  usually  rearrange  the  rows  and  columns  of 
the  various  bases  encountered  in  the  course  of  solution  so  that  they  are  nearly  triangular. 
Near  triangularity  makes  it  a  relatively  inexpensive  operation  to  represent  it  as  a  product 
of  a  lower  and  upper  triangular  matrices  and  to  preserve  much  of  the  original  sparsity. 
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Even  if  the  bases  were  very  sparse  but  not  nearly  triangular,  solving  systems  Dv  =  A  , 
could  be  too  costly  to  perform. 

The  success  of  solving  linear  programming  therefore  depends  on  a  number  of  factors: 
(1)  the  power  of  computers,  (2)  extremely  clever  algorithms;  but  it  depends  most  of  all 
upon  (3)  a  lot  of  good  luck  that  the  matrices  of  practical  problems  will  be  very  very  sparse 
and  that  their  bases,  after  rearrangement,  will  be  nearly  triangular. 

For  forty  years  the  simplex  method  has  reigned  supreme  as  the  preferred  method  for 
solving  linear  programs.  It  is  historically  the  reason  for  the  practical  success  of  the  field. 
As  of  this  writing,  however,  the  algorithm  is  being  challenged  by  new  interior  methods 
proposed  by  N.  Karmarkar  [15]  and  others,  and  by  methods  that  exploit  special  structure. 
If  these  new  methods  turn  out  to  be  more  successful  than  the  simplex  method  for  solving 
certain  practical  classes  of  problems,  I  predict  it  will  not  be  because  of  any  theoretical 
reasons  having  to  do  with  polynomial  time  but  because  they  can  more  effectively  exploit 
the  sparsity  and  near  triangularity  of  practical  problems  than  the  simplex  method  is  able 
to  do. 
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In  the  summer  of  1947,  when  f.  first  began  to  work  on  the  simplex  method  for 
solving  linear  programs,  the  first  idea  that  occurred  to  We  is  one  that  would 
occur  to  any  trained  mathematician,  namely  the  idea  of  step  by  step  descent 
(with  respect  to  the  objective  function)  along  edges  of  the  convex  polyhedral 
set  from  one  vertex  to  an  adjacent  one.  I  rejected  this  algorithm  outright 
on  intuitive  grounds  —  it  had  to  be  inefficient  because  it  proposed  to  solve 
the  problem  by  wandering  along  some  path  of  outside  edges  until  the  optimal 
vertex  was  reached.  I  therefore  began  to  look  for  other  methods  which  gave 
more  promise  of  being  efficient,  such  as  those  that  went  directly  through  the 
interior,  (l).  ? 

Today  we  know  that  before  1947  that  five  isolated  papers  had  been  published 
on  special  cases  of  the  linear  programming  problem  by  Monge  (1781)  [2lJ, 
Fourier  (1824)  [&},  de  la  Vall4e  Poussin  (1911). '{6],  Kantorovich  (l939).[7j  and 
Hitchcock  (1941).  (8).  Fourier,  Poussin,  and  Hitchcock  proposed  as  a  solution 
method  descent  along  the  outside  edges  of  the  polyhedral  set  which  is  the  way 
we  describe  the  simplex  method  today.  There  is  no  evidence  that  these  papers 
had  any  influence  on  each  other.  Evidently  they  sparked  zero  interest  on  the 
part  of  other  mathematicians,  an  exception  being  a  paper  by  Appell  (1928) 
on  Monge’s  translocation  of  masses  problem.  These  references  were  unknown 
to  me  when  I  first  proposed  the  simplex  method.  As  we  shall  see  the  simplex 
algorithm  evolved  from  a  very  different  geometry,  one  in  which  it  appeared  to 
be  very  efficient. 


